Goto

Collaborating Authors

 model tree


Experiments with Optimal Model Trees

Roselli, Sabino Francesco, Frank, Eibe

arXiv.org Artificial Intelligence

Model trees provide an appealing way to perform interpretable machine learning for both classification and regression problems. In contrast to ``classic'' decision trees with constant values in their leaves, model trees can use linear combinations of predictor variables in their leaf nodes to form predictions, which can help achieve higher accuracy and smaller trees. Typical algorithms for learning model trees from training data work in a greedy fashion, growing the tree in a top-down manner by recursively splitting the data into smaller and smaller subsets. Crucially, the selected splits are only locally optimal, potentially rendering the tree overly complex and less accurate than a tree whose structure is globally optimal for the training data. In this paper, we empirically investigate the effect of constructing globally optimal model trees for classification and regression with linear support vector machines at the leaf nodes. To this end, we present mixed-integer linear programming formulations to learn optimal trees, compute such trees for a large collection of benchmark data sets, and compare their performance against greedily grown model trees in terms of interpretability and accuracy. We also compare to classic optimal and greedily grown decision trees, random forests, and support vector machines. Our results show that optimal model trees can achieve competitive accuracy with very small trees. We also investigate the effect on the accuracy of replacing axis-parallel splits with multivariate ones, foregoing interpretability while potentially obtaining greater accuracy.


Learning on Model Weights using Tree Experts

Horwitz, Eliahu, Cavia, Bar, Kahana, Jonathan, Hoshen, Yedid

arXiv.org Artificial Intelligence

The increasing availability of public models begs the question: can we train neural networks that use other networks as input? Such models allow us to study different aspects of a given neural network, for example, determining the categories in a model's training dataset. However, machine learning on model weights is challenging as they often exhibit significant variation unrelated to the models' semantic properties (nuisance variation). Here, we identify a key property of real-world models: most public models belong to a small set of Model Trees, where all models within a tree are fine-tuned from a common ancestor (e.g., a foundation model). Importantly, we find that within each tree there is less nuisance variation between models. Concretely, while learning across Model Trees requires complex architectures, even a linear classifier trained on a single model layer often works within trees. While effective, these linear classifiers are computationally expensive, especially when dealing with larger models that have many parameters. To address this, we introduce Probing Experts (ProbeX), a theoretically motivated and lightweight method. Notably, ProbeX is the first probing method specifically designed to learn from the weights of a single hidden model layer. We demonstrate the effectiveness of ProbeX by predicting the categories in a model's training dataset based only on its weights. Excitingly, ProbeX can also map the weights of Stable Diffusion into a shared weight-language embedding space, enabling zero-shot model classification.


On the Origin of Llamas: Model Tree Heritage Recovery

Horwitz, Eliahu, Shul, Asaf, Hoshen, Yedid

arXiv.org Artificial Intelligence

The rapid growth of neural network models shared on the internet has made model weights an important data modality. However, this information is underutilized as the weights are uninterpretable, and publicly available models are disorganized. Inspired by Darwin's tree of life, we define the Model Tree which describes the origin of models i.e., the parent model that was used to fine-tune the target model. Similarly to the natural world, the tree structure is unknown. In this paper, we introduce the task of Model Tree Heritage Recovery (MoTHer Recovery) for discovering Model Trees in the ever-growing universe of neural networks. Our hypothesis is that model weights encode this information, the challenge is to decode the underlying tree structure given the weights. Beyond the immediate application of model authorship attribution, MoTHer recovery holds exciting long-term applications akin to indexing the internet by search engines. Practically, for each pair of models, this task requires: i) determining if they are related, and ii) establishing the direction of the relationship. We find that certain distributional properties of the weights evolve monotonically during training, which enables us to classify the relationship between two given models. MoTHer recovery reconstructs entire model hierarchies, represented by a directed tree, where a parent model gives rise to multiple child models through additional training. Our approach successfully reconstructs complex Model Trees, as well as the structure of "in-the-wild" model families such as Llama 2 and Stable Diffusion.


Boosting-Based Sequential Meta-Tree Ensemble Construction for Improved Decision Trees

Maniwa, Ryota, Ichijo, Naoki, Nakahara, Yuta, Matsushima, Toshiyasu

arXiv.org Artificial Intelligence

A decision tree is one of the most popular approaches in machine learning fields. However, it suffers from the problem of overfitting caused by overly deepened trees. Then, a meta-tree is recently proposed. It solves the problem of overfitting caused by overly deepened trees. Moreover, the meta-tree guarantees statistical optimality based on Bayes decision theory. Therefore, the meta-tree is expected to perform better than the decision tree. In contrast to a single decision tree, it is known that ensembles of decision trees, which are typically constructed boosting algorithms, are more effective in improving predictive performance. Thus, it is expected that ensembles of meta-trees are more effective in improving predictive performance than a single meta-tree, and there are no previous studies that construct multiple meta-trees in boosting. Therefore, in this study, we propose a method to construct multiple meta-trees using a boosting approach. Through experiments with synthetic and benchmark datasets, we conduct a performance comparison between the proposed methods and the conventional methods using ensembles of decision trees. Furthermore, while ensembles of decision trees can cause overfitting as well as a single decision tree, experiments confirmed that ensembles of meta-trees can prevent overfitting due to the tree depth.


DiffusionGPT: LLM-Driven Text-to-Image Generation System

Qin, Jie, Wu, Jie, Chen, Weifeng, Ren, Yuxi, Li, Huixia, Wu, Hefeng, Xiao, Xuefeng, Wang, Rui, Wen, Shilei

arXiv.org Artificial Intelligence

Diffusion models have opened up new avenues for the field of image generation, resulting in the proliferation of high-quality models shared on open-source platforms. However, a major challenge persists in current text-to-image systems are often unable to handle diverse inputs, or are limited to single model results. Current unified attempts often fall into two orthogonal aspects: i) parse Diverse Prompts in input stage; ii) activate expert model to output. To combine the best of both worlds, we propose DiffusionGPT, which leverages Large Language Models (LLM) to offer a unified generation system capable of seamlessly accommodating various types of prompts and integrating domain-expert models. DiffusionGPT constructs domain-specific Trees for various generative models based on prior knowledge. When provided with an input, the LLM parses the prompt and employs the Trees-of-Thought to guide the selection of an appropriate model, thereby relaxing input constraints and ensuring exceptional performance across diverse domains. Moreover, we introduce Advantage Databases, where the Tree-of-Thought is enriched with human feedback, aligning the model selection process with human preferences. Through extensive experiments and comparisons, we demonstrate the effectiveness of DiffusionGPT, showcasing its potential for pushing the boundaries of image synthesis in diverse domains.


Cracking the Black Box: Distilling Deep Sports Analytics

Sun, Xiangyu, Davis, Jack, Schulte, Oliver, Liu, Guiliang

arXiv.org Machine Learning

This paper addresses the trade-off between Accuracy and Transparency for deep learning applied to sports analytics. Neural nets achieve great predictive accuracy through deep learning, and are popular in sports analytics. But it is hard to interpret a neural net model and harder still to extract actionable insights from the knowledge implicit in it. Therefore, we built a simple and transparent model that mimics the output of the original deep learning model and represents the learned knowledge in an explicit interpretable way. Our mimic model is a linear model tree, which combines a collection of linear models with a regression-tree structure. The tree version of a neural network achieves high fidelity, explains itself, and produces insights for expert stakeholders such as athletes and coaches. We propose and compare several scalable model tree learning heuristics to address the computational challenge from datasets with millions of data points.


Customized Video QoE Estimation with Algorithm-Agnostic Transfer Learning

Ickin, Selim, Fiedler, Markus, Vandikas, Konstantinos

arXiv.org Machine Learning

The development of QoE models by means of Machine Learning (ML) is challenging, amongst others due to small-size datasets, lack of diversity in user profiles in the source domain, and too much diversity in the target domains of QoE models. Furthermore, datasets can be hard to share between research entities, as the machine learning models and the collected user data from the user studies may be IPR- or GDPR-sensitive. This makes a decentralized learning-based framework appealing for sharing and aggregating learned knowledge in-between the local models that map the obtained metrics to the user QoE, such as Mean Opinion Scores (MOS). In this paper, we present a transfer learning-based ML model training approach, which allows decentralized local models to share generic indicators on MOS to learn a generic base model, and then customize the generic base model further using additional features that are unique to those specific localized (and potentially sensitive) QoE nodes. We show that the proposed approach is agnostic to specific ML algorithms, stacked upon each other, as it does not necessitate the collaborating localized nodes to run the same ML algorithm. Our reproducible results reveal the advantages of stacking various generic and specific models with corresponding weight factors. Moreover, we identify the optimal combination of algorithms and weight factors for the corresponding localized QoE nodes.


A Gradient-Based Split Criterion for Highly Accurate and Transparent Model Trees

Broelemann, Klaus, Kasneci, Gjergji

arXiv.org Machine Learning

Machine learning algorithms aim at minimizing the number of false decisions and increasing the accuracy of predictions. However, the high predictive power of advanced algorithms comes at the costs of transparency. State-of-the-art methods, such as neural networks and ensemble methods, often result in highly complex models that offer little transparency. We propose shallow model trees as a way to combine simple and highly transparent predictive models for higher predictive power without losing the transparency of the original models. We present a novel split criterion for model trees that allows for significantly higher predictive power than state-of-the-art model trees while maintaining the same level of simplicity. This novel approach finds split points which allow the underlying simple models to make better predictions on the corresponding data. In addition, we introduce multiple mechanisms to increase the transparency of the resulting trees.


Extreme Gradient Boosting and Behavioral Biometrics

Manning, Benjamin (University of Georgia)

AAAI Conferences

As insider hacks become more prevalent it is becoming more useful to identify valid users from the inside of a system rather than from the usual external entry points where exploits are used to gain entry. One of the main goals of this study was to ascertain how well Gradient Boosting could be used for prediction or, in this case, classification or identification of a specific user through the learning of HCI-based behavioral biometrics. If applicable, this procedure could be used to verify users after they have gained entry into a protected system using data that is as human-centric as other biometrics, but less invasive. For this study an Extreme Gradient Boosting algorithm was used for training and testing on a dataset containing keystroke dynamics information. This specific algorithm was chosen because the majority of current research utilizes mainstream methods such as KNN and SVM and the hypothesis of this study was centered on the potential applicability of ensemble related decision or model trees. The final predictive model produced an accuracy of 0.941 with a Kappa value of 0.942 demonstrating that HCI-based behavioral biometrics in the form of keystroke dynamics can be used to identify the users of a system.


Logistic model tree - Wikipedia, the free encyclopedia

#artificialintelligence

In computer science, a logistic model tree (LMT) is a classification model with an associated supervised training algorithm that combines logistic regression (LR) and decision tree learning.[1][2] Logistic model trees are based on the earlier idea of a model tree: a decision tree that has linear regression models at its leaves to provide a piecewise linear regression model (where ordinary decision trees with constants at their leaves would produce a piecewise constant model).[1] In the logistic variant, the LogitBoost algorithm is used to produce an LR model at every node in the tree; the node is then split using the C4.5 criterion. Each LogitBoost invocation is warm-started[vague] from its results in the parent node. Finally, the tree is pruned.[3]